Write down a function for computing the value of the density function for a bivariate Gaussian in \((x_1,x_2)\) having mean vector components equal to zero and unity variances, that is
NBiv <-function(x,y, rho){if (!is.numeric(x) ||!is.numeric(y)) stop("x and y must be numeric scalars")if (!is.numeric(rho) ||length(rho) !=1L) stop("rho must be a single number")if (abs(rho) >=1) stop("rho must be a scalar in (-1, 1)") den <-2* pi *sqrt(1- rho^2) num <-exp(-0.5* (x^2+ y^2-2*rho*x*y) / (1- rho^2))return(num/den)}
Bivariate Normal Distribution: 3D Surface plot and Contour levels
Bivariate Normal Distribution: Example
Biometric example
Anthropometric data analysis represents an important stumbling block of statistics analysis (around 19th and early 20th Centuries, Quetelet, Galton, Pearson, etc. investigates biometric measurements)
Here, we consider two measurements, the weight and the height of an adult
We are simplifying the phenomenon (both are dependent on the gender, geographical characteristics, and so on)
Assumption
Let \(X_1\) and \(X_2\) the r.v. representing the height and the weight measurements
We assume the random vector \((X_1,X_2)\) follows a bivariate normal distribution
\(X_1,\ldots, X_n\): sequence of independent and identically distributed (iid) random variables (rv) from a distribution having \[E(X_i) = \mu, \quad \quad \text{var}(X_i) = \sigma^2 < \infty, \quad i=1, \ldots, n\]
For large n (\(\dot \sim\) indicates convergence in distribution) \[ \bar X_n = \frac{1}{n} \sum_{i=1}^{n}X_i \, \dot \sim \, \mathcal{N}(\mu, \sigma^2/n) \quad S_n = \sum_{i=1}^{n}X_i \, \dot \sim \, \mathcal{N}(n\mu, n\sigma^2)\]
The CLT supports the normal approximation to the distribution of a random variable (rv) that can be viewed as the sum of other rvs
Approximation with CLT: application
The CLT is useful for computing some quantities
Example
Let \(X\) and \(Y\) be two independent rv, such that \(X \sim {\rm Bin}(n,p)\) and \(Y \sim {\rm Bin}(m, q)\)
We are interested in computing the probability \(\mathbb{P}(X>Y)\)
The Normal approximation is the simplest way to compute\(\mathbb{P}(X>Y)\)
\(W=X-Y\) is still normal (well known probability result), having \[\mu_W=\mu_X-\mu_Y= np-mq \quad \quad \sigma^2_W=\sigma^2_X+\sigma^2_Y= np(1-p)+mq(1-q)\]
\(\mathbb{P}(X>Y) = \mathbb{P}(W>0)\) can be computed easily
Approximation with CLT: Example
Waterpolo match
Problem: Tomorrow two professional Italian waterpolo teams, Posillipo and Pro Recco, compete against each other.
Goal: We are interested in computing the probability that Posillipo win the next match against Pro Recco
Assumptions and quantity of interest
Let \(X\) (\(Y\)) be the random number of goals scored by Posillipo (Pro Recco), and assume that \(X, Y\) follow two independent Binomial distributions
\(X\) (\(Y\)) represents the number of shots converted in goal on the total number of shots \(n\) ( \(m\)) made by Posillipo (Pro Recco), having probability \(p\) (\(q\))
Probability that Posillipo win: \(\mathbb{P}(X>Y)=\mathbb{P}(X-Y>0)=?\)
Approximation with CLT: Example
Waterpolo match
We adopt a simplification, and we treat the quantities \(p,q,m,n\) as known. For instance, based on historical experience we fix \[p=0.5, \quad q=0.7, \quad n=20, \quad m=20\]
Let \(W\) be the r.v., such that \(W=X-Y\), we can easily compute the probability of interest (\(\mathbb{P}(X - Y > 0) = \mathbb{P}(W>0) =?\)) leveraging the CLT \[W \approx \mathcal{N}(\mu_W = \mu_X-\mu_Y, \sigma^2_W = \sigma^2_X + \sigma^2_Y)\]
What is your guess on \(\mathbb{P}(W>0)\)?
Around 0.5;
A small value (close to 0);
A large value (close to 1)
Approximation with CLT: Example
Waterpolo match
We adopt a simplification, and we treat the quantities \(p,q,m,n\) as known. For instance, based on historical experience we fix \[p=0.5, \quad q=0.7, \quad n=20, \quad m=20\]
Let \(W\) be the r.v., such that \(W=X-Y\), we can easily compute the probability of interest (\(\mathbb{P}(X - Y > 0) = \mathbb{P}(W>0) =?\)) leveraging the CLT \[W \approx \mathcal{N}(\mu_W = \mu_X-\mu_Y, \sigma^2_W = \sigma^2_X + \sigma^2_Y)\]
Time to work with R and obtaining a value for the probability of interest
In the Lab R file we will complete the missing part of the code (denoted with XXX)
Approximation with CLT: R code
p <-0.5q <-0.7n <- m <-20mW <- p * n - q * mvarW <- n * p * (1- p) + m * q * (1- q)sdW <-sqrt(varW)# P(X > Y) = P(W > 0) = ? PWin_P <-pnorm(0, mean = mW, sd = sdW, lower.tail =FALSE)PWin_P
[1] 0.09362452
Quick comments
This value of the probability is based on the assumptions we did
However, according to our guessing there is low probability that Posillipo win the next match against Pro Recco
Approximation with CLT: R code
PMF of \(X\) (red) and \(Y\) (blue) and the pdf of \(W=X-Y\)
curve(dnorm(x, mW, sdW), xlim =c(-12, 20), ylim =c(0, 0.3),xlab ="", ylab ="", cex.axis =2.5)points(0: n, dbinom(0: n, n, p), pch =21, bg =1, col ="blue")points(0: m, dbinom(0: m, m, q), pch =21, bg =2)segments(x0 =0, y0 =-0.01, x1 =0, y1 =dnorm(0, mW, sdW), lwd =2)text(14.25, 0.22, "Y", cex =3, col =2)text(10, 0.2, "X", cex =3, col ="blue")text(-4, 0.15, "X - Y", cex =3, col =1)
Approximation with CLT: A Note
Continuity correction
When discrete distributions are approximated by continuous distributions, it is a good practice to apply a correction (c.c.). In this case: \[\small{\mathbb{P}(X>Y) \approx \mathbb{P}(W>0) \stackrel{c.c.}{\approx} \mathbb{P}(W > 0.5)}\]
PWin_P_cc <-pnorm(0.5, mean = mW, sd = sdW, lower.tail =FALSE)PWin_P_cc
[1] 0.06895673
PWin_P # Without c.c.
[1] 0.09362452
In general
Let \(X\) a discrete r.v. that you approximate with a continuos one (\(\tilde X\)), the continuity corrections are
\[\small{\mathbb{P}(X > x) \approx \mathbb{P}(\tilde X > x) \stackrel{c.c}\approx \mathbb{P}(\tilde X > x + 0.5)}\]\[\small{\mathbb{P}(X \geq x) \approx \mathbb{P}(\tilde X \geq x) \stackrel{c.c}\approx \mathbb{P}(\tilde X > x - 0.5)}\]\[\small{\mathbb{P}(X \leq x) \approx \mathbb{P}(\tilde X \leq x) \stackrel{c.c}\approx \mathbb{P}(\tilde X < x + 0.5)}\]\[\small{\mathbb{P}(X < x) \approx \mathbb{P}(\tilde X < x) \stackrel{c.c}\approx \mathbb{P}(\tilde X < x - 0.5)}\]
Approximation with CLT: A comment
Ideas?
There are several potential limits of our probabilistic model, but one can deserve your attention.
What could be one of the limit of the previous analysis, leaving aside considerations on the assumptions we did?
Normal approximation when \(n\) is not large
We are leveraging the CLT for approximating the Binomial distribution with the Gaussian one
But when \(n\) is not large the approximation can be poor
The question on “how large should be \(n\) to consider good the approximation?” deserve a comment: It depends on specific conditions and only sometimes you can follow the rule of thumb that \(n\) greater than 30/50 guarantees a good approximation
Approximation with CLT: Investigation
Ideas?
Let’s explore the quality of the approximation using out example * The Normal approximation is the simplest way to compute \(\mathbb{P}(W>0)\), but you can obtain the results differently
Obtain the exact result
Is the difference \(X-Y\) a well-known probability distribution?
No, you must compute the probability law
You can obtain the result (with a bit of effort) by using probability calculus \[\small{\mathbb{P}(W = w) = \sum_{x=0}^{m-w} \binom{n}{x} p^{x}(1-p)^{n-x} \binom{m}{x-w} q^{x-w}(1-q)^{m - (x-w)}}\]
Exact result: Naive R implementation
Obtain the probability distribution
There is much space to improve this naive R implementation
Px <-dbinom(0: n, n, p = p) Py <-dbinom(0: m, m, p = q) Pw <-rep(NA, 2* n +1)count <-1; count2 <-2* n +1for(i in0: n){ idx1 <-1: (i +1) idx2 <- (n +1- i) : (n +1)if(i == n){ Pw[count] <-sum(Px[idx1] * Py[idx2]) } else { Pw[count] <-sum(Px[idx1] * Py[idx2]) Pw[count2] <-sum(Px[idx2] * Py[idx1]) } count <- count +1 count2 <- count2 -1}sum(Pw) # check
[1] 1
Exact result: Comparisons
Compare approximate solutions (with and without c.c.) with the exact one
Slight difference between the exact solution and the approximated one
sum(Pw[(n +2) :length(Pw)]) # P(W>0 )
[1] 0.06995932
PWin_P # appr. P(W>0 )
[1] 0.09362452
PWin_P_cc # appr. with c.c. P(W>0 )
[1] 0.06895673
Exact result: Graphical Comparisons
plot(-n : m, Pw, xlab ="w", col ="red", cex.axis =1.5, cex=2)curve(dnorm(x, mW, sdW), add =TRUE)
The role of \(n\) on the approximation
Let’s explore a case with lower n
Leaving unchanged the rest, consider \(n = m = 5\)
The quality of the approximation (without c.c.) deteriorates
sum(Pw[(n +2) :length(Pw)]) # P(W>0)
[1] 0.1606744
PWin_P # P(W>0) approx.
[1] 0.2548257
PWin_P_cc # P(W>0) approx. c.c.
[1] 0.1613143
Exact result: Graphical Comparisons
plot(-n : m, Pw, xlab ="w", col ="red")curve(dnorm(x, mW, sdW), add=TRUE)
The role of \(n\) on the approximation
Let’s explore a case with larger n
Leaving unchanged the rest, consider \(n = m = 100\)
The quality of the approximation (without c.c.) improves remarkably
sum(Pw[(n +2) :length(Pw)]) # P(W>0 )
[1] 0.001365914
PWin_P
[1] 0.00159485
PWin_P_cc
[1] 0.001253232
Extra: A different approach?
Waterpolo match
Rather than specifying in advance the total number of unknown shots and the converting shots probabilities, i.e. 4 parameters, one could be tempted to use a more flexible distribution accounting just for the scoring intensity, regardless of the number of shots
For this purpose, the Poisson distribution seems suitable. We may assume two independent Poisson distributions for the number of goals of the upcoming match: \(X\sim \text{Pois}(\lambda), \ Y \sim \text{Pois}(\mu)\)
At this stage, we need to specify the rates, for instance upon our own knowledge about waterpolo goal abilities we fix \(\lambda=5, \mu=7\)
Extra: A different approach?
Exercise
Leveraging the Poisson assumption on the number of goals scored, how can we estimate now the winning probability for Posillipo, \(\mathbb{P}(X>Y)=\mathbb{P}(X-Y>0)\)?
Obtain \(\mathbb{P}(X-Y>0)\) using the exact and approximate solution and compare them
Hint to obtain the exact solution
We may use the following probability result: \(Z = X-Y \sim \text{PD}(\lambda-\mu, \lambda+\mu)\), where \(\text{PD}\) stands for the Poisson difference distribution, also known as Skellam distribution, with mean \(\lambda-\mu\) and variance \(\lambda+\mu\) (use the skellam R package)
Law of large numbers
Law of large numbers (LLN)
Recall the LLN
Let \(X_1, \ldots, X_n\), be a sequence of independent and identically distributed r.v., such that \(E[X_i] =\mu\) and \(\text{var}(X_i) = \sigma^2 < +\infty\). Then, given the sample mean \[\bar X_n = \frac{1}{n}\sum_{i=1}^n X_i, \] we have that, for \(\epsilon>0\),
Suppose tossing a coin \(n\) times and let \(k\) the number of heads obtained. Then \(k/n\) is the proportion of heads obtained in \(n\) tosses. If the coin is fair then we can guess that the proportion is not too far from 0.5 (unlikely it should be exactly 0.5). Although it could happen than, due to the chance, we can observe anomalies represented by large or small number of heads, by increasing the number of tosses such a strange behavior disappear.
Law of large numbers (LLN): Example
LLN_ex <-function(n, p =0.5, seed =1234){par(mfrow =c(1, 2), mar =c(4, 4, 1, 1))set.seed(seed)#Each trajectory is a realization of a sequence of rv# First trajectory x <-sample(c(0, 1), n, prob =c(1- p, p), replace = T)plot(1: n, cumsum(x)/(1: n), type ="l", ylim =c(0, 1), ylab =expression(n[Head]/n), xlab ="n") abline(h = .5, lty =2, col ="red")# Second trajectory x <-sample(c(0, 1), n, prob =c(1- p, p), replace = T)points(1: n, cumsum(x)/(1: n), type ="l", col ="grey")plot(1: n, cumsum(x)/(1: n), type ="l", ylim =c(0, 1), ylab =expression(n[Head]/n), xlab="n", col ="gray80")# Instead, considering 100 trajectoriesfor(i in1:100){ x <-sample(c(0, 1), n, prob =c(1- p, p), replace = T)points(1: n, cumsum(x)/(1: n), type ="l", col ="gray80", lwd = .5) }abline(h = .5, lty =2, col ="red")}
Law of large numbers (LLN): Example \((n = 500)\)
n <-500LLN_ex(n)
Law of large numbers (LLN): Example \((n = 5000)\)